Reconstructing the massive black hole cosmic history through gravitational waves 
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The massive black holes we observe in galaxies today are the natural end-product of a complex 
evolutionary path, in which black holes seeded in proto-galaxies at high redshift grow through 
cosmic history via a sequence of mergers and accretion episodes. Electromagnetic observations 
probe a small subset of the population of massive black holes (namely, those that are active or those 
that are very close to us), but planned space-based gravitational- wave observatories such as the 
Laser Interferometer Space Antenna (LISA) can measure the parameters of "electromagnetically 
invisible" massive black holes out to high redshift. In this paper we introduce a Bayesian framework 
to analyze the information that can be gathered from a set of such measurements. Our goal is to 
connect a set of massive black hole binary merger observations to the underlying model of massive 
black hole formation. In other words, given a set of observed massive black hole coalescences, we 
assess what information can be extracted about the underlying massive black hole population model. 
For concreteness we consider ten specific models of massive black hole formation, chosen to probe 
four important (and largely unconstrained) aspects of the input physics used in structure formation 
simulations: seed formation, metallicity "feedback", accretion efficiency and accretion geometry. For 
the first time we allow for the possibility of "model mixing", by drawing the observed population 
from some combination of the "pure" models that have been simulated. A Bayesian analysis allows 
us to recover a posterior probability distribution for the "mixing parameters" that characterize the 
fractions of each model represented in the observed distribution. Our work shows that LISA has 
enormous potential to probe the underlying physics of structure formation. 

PACS numbers: 04.30.Tv, 04.30.-w, 04.70.-s, 97.60.Lf 
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I. INTRODUCTION 

In ACDM cosmologies, structure formation proceeds 
in a hierarchical fashion in which massive galaxies 
are the result of several merging events involving smaller 
building blocks. In this framework, the massive black 
holes (MBHs) we see in today's galaxies are expected to 
be the natural end-product of a complex evolutionary 
path, in which black holes seeded in proto-galaxies at 
high redshift grow through cosmic history via a sequence 
of MBH-MBH mergers and accretion episodes 0, y| • Hi- 
erarchical models for MBH evolution, associating quasar 
activity to gas- fueled accretion following galaxy mergers, 
have been successful in reproducing several properties of 
the observed Universe, such as the present day mass den- 
sity of nuclear MBHs and the optical and X-ray luminos- 
ity functions of quasars 

However, only a few percent of galaxies host a quasar 
or an active galactic nucleus (AGN), while most galaxies 
harbor MBHs in their centers, as exemplified by stellar- 
and gas-dynamical measurements that led to the dis- 
covery of quiescent MBHs in almost all bright nearby 
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galaxies [10| , including the Milky Way . Our current 
knowledge of the MBH population is therefore limited 
to a small fraction of MBHs: either those that are ac- 
tive, or those in our neighborhood, where stellar- and 
gas-dynamical measurements are possible. Gravitational 
wave (GW) observatories can reveal the population of 
electromagnetically "black" MBHs. 

LISA will be capable of accurately measuring the pa- 
rameters of individual massive black hole binaries (MB- 
HBs), such as their masses and luminosity distance, al- 
lowing us to track the merger history of the MBH popu- 
lation out to large redshifts. MBHB mergers have been 
one of the main targ ets of the LISA mission since its con- 
ception (see e.g. [12j )■ Several authors have explored how 
spins, higher harmonics in the GW signal and eccentric- 
ity affect parameter estimation and in particular source 
localization, which is fundamental to search for electro- 
magnetic counterparts (see, for example, the work by the 
LISA parameter estimation taskforce [13| and references 
therein). Most work on parameter estimation has fo- 
cused on inspiral waveforms, but ringdown observations 
can also provide precise measurements of the parame- 
ters of remnant MBHs resulting from a merger, and even 
test the Kerr nature of astrophysical MBHs [1J]. Initial 
studies using numerical relativity waveforms suggest that 
mergers will improve the signal-to-noise ratio of individ- 
ual events and the localization accuracy of LISA 
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While highly precise measurements for individual sys- 
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terns are interesting and potentially very useful for mak- 
ing strong- field tests of general relativity, it is the proper- 
ties of the set of MBHB mergers that are observed which 
will carry the most information for astrophysics. To date, 
most of the body of work considering observations of 
more than one MBHB system has focused on the use 
of MBHBs as "standard sirens" jT(| to probe the expan- 
sion history of the Universe. For a subset of the observed 
binaries, LISA may have sufficient angular resolution to 
make follow-up electromagnetic observations feasible. If 
the host galaxy or galaxy cluster can be identified, this 
will allow LISA to measure the dark energy equation of 
state to levels comparable to those expected from other 
dark energy missions [l7j ■ The effectiveness of LISA as 
a dark energy probe is limited by weak lensing [l8| , but 
this can be mitigated to some extent fl9j . and a combi- 
nation of several GW detections may still provide useful 
constraints on the dark energy equation of state [20j . 

GW observations of multiple MBHB mergers could 
also be combined to extract useful astrophysical informa- 
tion about their formation and evolution through cosmic 
history. As already mentioned, our access to the MBH 
population in the Universe is limited to AGNs or to qui- 
escent MBHs in nearby galaxies. In this sense we are 
probing only the tip of the iceberg. Theoretical astro- 
physicists have developed a large variety of MBH forma- 
tion models [!, l2ll - |2~H that are compatible with observa- 
tional constraints. However, the natural lack of observa- 
tions of faint objects at high redshifts and the difficul- 
ties in measuring MBH spins leave a lot of freedom in 
modelling MBH seed formation and mass accretion. In 
the last decade, several authors have employed different 
MBH formation and evolution models to make predic- 
tions for future GW observations, focusing in particular 
on LISA [25 33- This effort has been very valuable, and 
established the detection of a large population of MBH 
binaries as one of the cornerstones of the LISA mission. 

In this paper we tackle the inverse problem: we do 
not ask what astrophysics can do for LISA, but what 
LISA can do for astrophysics. In particular, we ask the 
following question: can we discriminate among differ- 
ent MBH formation and evolution scenarios on the ba- 
sis of GW observations only? More ambitiously, given 
a set of observed MBHB coalescences, what information 
can be extracted about the underlying MBH population 
model? For example, will GW observations tell us some- 
thing about the mass spectrum of the seed black holes at 
high redshift that are inaccessible to conventional electro- 
magnetic observations, or about the poorly understood 
physics of accretion? Such information cannot be gleaned 
from a single GW observation, but it is encoded in the 
collective properties of the whole detected sample of co- 
alescing binaries. In this paper we describe a method 
to extract this information in order to make meaning- 
ful astrophysical statements. The method is based on a 
Bayesian framework, using a parametric model for the 
probability distribution of observed events. 

The paper is organized as follows. Section [TTI presents 



the general framework of our analysis. There we review 
the MBH formation models considered in this paper and 
explain how these models translate into a theoretically 
observable distribution via a "transfer function" that de- 
pends (for a given source) on the detector characteristics 
and on the assumed model for the gravitational wave- 
form. We describe how to sample MBH distributions via 
Monte Carlo methods, and how to interpret the obser- 
vations in a Bayesian framework. In Section IIIII we ap- 
ply these statistical methods to the problem of deciding, 
given a set of LISA observations, whether we can cor- 
rectly tell the true model from an alternative, for each 
pair in our family of MBH formation models. We focus 
in particular on specific comparisons that would allow us 
to set constraints on the main uncertainties in the in- 
put physics: namely, the seed formation mechanism, the 
redshift distribution of the first seeds, the efficiency of 
accretion during each merger and the geometry of ac- 
cretion. In Section IIVI we describe how to go beyond a 
simple catalog of pure models, either by introducing phe- 
nomenological mixing parameters (designed to gauge the 
relative importance of different physical mechanisms in 
the birth and growth of MBHs) between the pure mod- 
els, or by consistently implementing a mixture of differ- 
ent physical assumptions in a merger tree simulation. In 
Section [V] we explore how well a "consistently mixed" 
model can be recovered as a superposition of pure mod- 
els with the phenomenological mixing parameters. In the 
conclusions we point out possible extensions of our work. 
Appendix [X] provides details of our treatment of errors 
(due to instrumental noise, uncertainties in cosmological 
parameters and weak lensing) in the MBHB observations. 
Appendix [Bl compares parameter estimation calculations 
that do, or do not, take into account the orbital motion of 
LISA. The results suggest that angle-averaged codes that 
do not take into account the orbital motion may reduce 
computational requirements in Monte Carlo simulations, 
while still providing reasonable estimates of at least some 
binary parameters. 



II. MASSIVE BLACK HOLES: FORMATION 
MODELS, GRAVITATIONAL WAVE 
OBSERVATIONS AND THEIR 
INTERPRETATION 

Our goal is to assess the effectiveness of GW observa- 
tions in extracting useful information about the evolution 
of the MBH population in the Universe. Recent work by 
Plowman et al. [3l|, HH attempted to address the same 
question. Here we use different techniques, which im- 
prove on their analysis in several ways. Plowman et al. 
used the non-parametric Kolmogorov-Smirnov (KS) test 
to compare distributions of model parameters between 
models. This limited their comparisons to two param- 
eters at a time, as higher-dimensional KS tests are not 
known. We instead use a parametric model by consider- 
ing the number of events in any given part of parameter 
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space to be drawn from a Poisson probability distribu- 
tion. This allows us to use a Bayesian framework for the 
analysis. Such a framework will be important for the 
analysis of the actual LISA observations once these have 
been made, and it can be applied to a parameter space 
of any dimension. 

In Ref. [HI we used similar techniques to compare the 
same four models that were considered by Plowman et 
al., which were the models used for LISA parameter esti- 
mation accuracy studies in Ref. I3[ . In this paper we go 



considerably further by considering six additional mod- 
els, chosen to probe four key aspects of the input physics 
used in structure formation simulations: seed formation, 
metallicity "feedback" , accretion efficiency and accretion 
geometry. In addition, we consider for the first time 
"model mixing". The idea is to assume that the ob- 
served population is drawn from some combination of the 
"pure" models that have been simulated. The Bayesian 
framework allows us to recover a posterior probability 
distribution for the "mixing parameters" that character- 
ize the fraction of each model represented in the observed 
distribution. Such an analysis is not possible in the KS 
framework. The model mixing analysis is very important, 
as the real Universe is most certainly not drawn from any 
of the idealized models that currently exist. The mixing 
parameters will reflect the relative contributions in the 
true Universe of the different input physics in the pure 
models. 

For our analysis, we adopt the following strategy: 

• We consider a set of MBH formation and evolu- 
tion models predicting different coalescing MBHB 
theoretical distributions (Section III A[) : 

• To account for detection incompleteness, we filter 
the distribution predicted by each model using a 
detector "transfer function" that produces the ob- 
served theoretical distributions under some specific 
assumptions about the GW detector (Section lllBI) . 
This is basically the distribution one would observe 
assuming an infinite number of detections; 

• We generate Monte Carlo realizations of the coa- 
lescing MBHB population from one of the models 
( Section III C[) or from a mixture of models (Section 
lIVp . and simulate GW observations of the inspi- 
ralling binaries, including errors in the source pa- 
rameter estimation; 

• We then compare (in a statistical sense) the catalog 
of observed events - including measurement errors 
- with the observed theoretical distributions, to as- 
sess at what level of confidence we can recover the 
parent model. The statistical methods we use are 
detailed in Section TlIDI 

In this paper we will consider the Laser Interferometer 
Space Antenna (LISA) as an illustrative case, but the 
strategy outlined above can easily be generalized to other 



proposed space-borne GW observatories, such as ALIA, 
DECIGO or BBO [Mil. 

An important caveat is that, for a source at redshift z, 
GW observations do not measure the binary parameters 
in the source frame, but rather the corresponding red- 
shifted quantities in the detector frame. For this reason, 
throughout the paper we shall characterize MBHBs via 
their redshifted parameters. Given a MBHB with rest- 
frame masses M\^ r > M 2>r: the masses in the detector 
frame are given by M x = (I + z)M 1>r , M 2 = (1 + z)M 2>r . 
In terms of these masses we can also define (as is the 
custom in GW physics) the total mass M = Mi + AI 2 , 
the mass ratio q = M 2 /Mi, the symmetric mass ratio 
?/ = MiM 2 /M 2 and the chirp mass M = r/ 3/5 M. In our 
calculations we assume a concordance ACDM cosmology 
characterized by Ho = 70km s" 1 Mpc -1 , 51 m = 0.27 and 
0\ = 0.73. 

For simplicity we will focus on the inspiral of circular, 
non-spinning binaries; therefore, each coalescing MBHB 
in our populations will be characterized by only three in- 
trinsic parameters (z, M and q). In terms of gravitational 
waveform modelling, the results presented here can be 
considered conservative. Different accretion models may 
result in different MBH spin distributions. Including spin 
in the analysis will provide additional information that 
will help to further constrain the physical mechanisms at 
work in shaping the MBH population model [32l[37j. The 
inclusion of the merger/ringdown portion of the signal 
will increase the signal-to-noise ratio (SNR) of observed 
binaries and allow measurements of the parameters of 
the merger remnants, providing additional information 
on the mechanisms responsible for MBH growth. 

In the following subsections we will introduce all the 
elements and methodologies relevant to our analysis. 



A. Cosmological massive black hole populations 

The assembly of MBHs is reconstructed through dedi- 
cated Monte Carlo merger tree simulations [|| which are 
framed in the hierarchical structure formation paradigm. 
Each model is constructed by tracing the merger hi- 
erarchy of ^200 dark matter halos in the mass range 
10 11 — I0 15 M Q backwards to z = 20, using an extended 
Press & Schechter (EPS) algorithm (see (3|] for details). 
The halos are then seeded with black holes and their 
evolution is tracked forward to the present time. Fol- 
lowing a major merger (defined as a merger between two 
halos with mass ratio M/ l2 /M^ 1 > 0.1, where Mh 2 is 
the mass of the lighter halo), MBHs accrete efficiently 
an amount of mass that scales with the fifth power of 
the host halo circular velocity and that is normalized to 
reproduce the observed local correlation between MBH 
mass and the bulge stellar velocity dispersion (the M — a 
relation, see [38| and references therein). For each of the 
simulated halos, all of the binary coalescences that oc- 
cur are stored in a catalog. The results for each halo are 
then weighted using the EPS halo mass function and arc 
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numerically integrated over the observable volume shell 
at every redshift to obtain the coalescence rate of MB- 
HBs as a function of black hole masses and redshift (see, 
e.g., Fig. 1 in 28]). We then find the theoretical distri- 
bution of (potentially) observable coalescing binaries by 
multiplying the rate by the LISA mission lifetime (here 
assumed to be three years) to obtain the distribution 
Ni = d 3 Ni/dzdAIdq, where the index i labels the MBH 
formation model. 

In the general picture of MBH cosmic evolution, the 
MBH population is shaped by the details of the seeding 
process and the accretion history. Both issues are poorly 
understood, and largely unconstrained by present obser- 
vations. We identify four key factors that have a direct 
impact on specific observable properties of the merging 
MBHB population: 

1. the seed formation mechanism shapes the initial 
seed mass function; 

2. the impact of metallicity on MBH formation deter- 
mines the redshift distribution of the seeds; 

3. the accretion efficiency determines the growth rate 
of MBHs over cosmic history; 

4. the accretion geometry is crucial in the evolution 
of the MBH spins. 

We explore different formation scenarios by considering 
two different prescriptions for each of the elements in the 
above list, as follows: 

1. The seed formation mechanism. Two distinct fam- 
ilies of models have become popular in the last 
decade, usually referred to as "light" and "heavy" 
seed models. Here we consider two different scenar- 
ios representative of the two possibilities, (i) The 

"VHM" model; developed by Volonteri, Haardt & 
Madau [|[, this model is characterized by light 
seeds (M ~ 1OOM ), which are thought to be the 
remnants of Population III (POPIII) stars, the first 
generation of stars in the Universe JjOJ- (ii) The 

"BVR" model; proposed by Begelman, Volonteri 
& Rees HU, this model belongs to the family of 

"heavy seed" models. Bar within bar instabilities 
[40j occurring in massive protogalactic disks trigger 
gas inflow toward the center, where a "quasistar" 
forms. The core of the quasistar collapses into a 
seed black hole that efficiently accretes from the 
quasistar envelope, resulting in a final seed black 
hole with mass M ~ few xlO 4 M . 

2. Metallicity "feedback". Both of the black hole for- 
mation models described above require that a large 
amount of gas is efficiently transported to the halo 
center. The gas inflow has to occur on a timescale 
that is shorter than that of star formation, to avoid 
competition in gas consumption and disruption of 
the inflow process by supernovae explosions. It has 



been suggested that metal-free conditions are con- 
ducive to efficient gas inflow, as fragmentation is 
inhibited [4lj . If fragmentation is suppressed, and 
cooling proceeds gradually, the gaseous component 
can cool and contract before many stars form. The 
gas metallicity Z is therefore an important environ- 
mental factor to take into account, and we consider 
two cases, (i) "noZ" models; black hole seeding is 
assumed to be efficient at zero metallicity only, with 
a sharp threshold in cosmic time. In these models, 
seeds form at very high redshift (20 > z > 15). (ii) 
"Z" models; efficient seed formation occurs also at 
later times. Here we treat POPIII star and quasis- 
tar black hole formation differently. We still assume 
that POPIII stars can form only out of metal-free 
gas, but we track the probability that a halo at late 
times is still metal-free by adopting the metal en- 
richment models developed in [42j . For the case of 
quasistars, instead, we drop the assumption of zero 
metallicity. This choice is motivated by recent high- 
resolution numerical simulations of gas-rich galax- 
ies at solar mctallicities (e.g. [HI), which show that 
bar within bar instabilities can drive a significant 
amount of gas to the central nucleus before star 
formation quenches the inflow. These models are 
characterized by seed formation also at later times, 
in metal enriched halos. See [24[ for full details on 
the model and its implementation. 

3. The accretion efficiency. MBHs powering AGNs 
exhibit a broad phenomenology; they accrete at 
different rates, with different efficiencies and lumi- 
nosities (see 44| and references therein). In the 
absence of a solid coherent theory for describing 
the accretion process, several toy models are vi- 
able, and we consider two of these models, (i) 
"Edd" accretion model; the easiest possible recipe 
is to assume that accretion occurs at the Edding- 
ton rate, parametrized through the Eddington ra- 
tio f e (we take f e = 0.3 in our models), (ii) "MH" 
accretion model; we also use a more sophisticated 
scheme combining low and high accretion rates, as 
described by Merloni & Heinz [44 1 . 



The geometry of accretion. Standard accretion 
disks are unstable to self gravit y b eyond a few 
thousands of Schwarzschild radii [451 ] . It is there- 
fore not guaranteed that the supply of gas to the 
central black hole will be continuous, smooth and 
planar. We consider two different scenarios, (i) 
Coherent accretion ( "co" models); the flow of ma- 
terial that feeds the black hole is assumed to be 
continuous, smooth and planar. Accretion is a sin- 
gle steady episode lasting about a Salpeter time, 
(ii) Chaotic accretion ( "ch" models) • in this sce- 
nario, proposed by King & Pringle j46|, a single ac- 
cretion event is made of a collection of short-lived 
accretion episodes, and the angular momentum of 
each accreted matter clump is randomly oriented. 
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Name 


i 


Seeding 


Metallicity Accretion model Accretion geometry Ni 


[yr- 1 ] 


VHM-noZ-Edd-co 


1 


POPIII 


Z = 


Eddington 


coherent 


86 


VHM-noZ-Edd-ch 


2 


POPIII 


Z = 


Eddington 


chaotic 


81 


VHM-Z-Edd-co 


3 


POPIII 


all Z 


Eddington 


coherent 


108 


VHM-Z-Edd-ch 


4 


POPIII 


all Z 


Eddington 


chaotic 


113 


BVR-noZ-Edd-co 


5 


Quasistar 


Z = 


Eddington 


coherent 


26 


BVR-noZ-Edd-ch 


6 


Quasistar 


Z = 


Eddington 


chaotic 


24 


BVR-Z-Edd-co 


7 


Quasistar 


all Z 


Eddington 


coherent 


22 


BVR-Z-Edd-ch 


8 


Quasistar 


all Z 


Eddington 


chaotic 


29 


BVR-noZ-MH-co 


9 


Quasistar 


Z = 


Merloni & Heinz 


coherent 


33 


BVR-noZ-MH-ch 


10 Quasistar 


Z = 


Merloni & Heinz 


chaotic 


33 



TABLE I. The ten "pure" MBH population models considered in this paper. For convenience, in the following we will identify 
models by the integer, i, listed in the second column. In the last column, N% denotes the predicted coalescence rate. 



These accretion models primarily lead to different 
expectations for the black hole spins: intermediate- 
high, a ~ 0.6 — 0.9, in the coherent case; low, 
a < 0.2, in the chaotic case 371. In this work we 



ignore black hole spin in the modelling of gravita- 
tional waveforms, and therefore we do not assess 
the impact of spin measurements in resolving dif- 
ferent MBH formation scenarios. However, the ac- 
cretion prescription also leaves an imprint on the 
component masses. The models assume that the 
mass-to-energy conversion efficiency, e, depends on 
black hole spin only, so the two models predict dif- 
ferent average efficiencies of ~ 20% and ~ 10%, re- 
spectively. The mass-to-energy conversion directly 
affects mass growth, with high efficiency implying 
slow growth, since for a black hole accreting at the 
Eddington rate the black hole mass increases with 
time as 



M(t) = M(0) exp 



1 



t 



(1) 



where tEdd = 0.45Gyr. The "coherent" versus 
"chaotic" models thus allow us to study how dif- 
ferent growth rates affect LISA observations. 

By choosing two different prescriptions for each of the 
four pieces of input physics listed above we built ten dif- 
ferent MBH population models, which are summarized in 
Table H We shall refer to these models as "pure" , in the 
sense that we do not mix different recipes for seed forma- 
tion and accretion history (e.g., accretion is either coher- 
ent or chaotic, etc.). We will consider "mixed" models in 
Section llVl It is worth emphasizing that all of these mod- 
els successfully reproduce various properties of the ob- 
served Universe, such as the present-day mass density of 
nuclear MBHs and the optical and X-ray luminosity func- 
tions of quasars. GW observations may therefore provide 
an invaluable tool to constrain the birth and growth of 
MBHs, particularly at high redshift. 



Theoretically observable distributions: the 
transfer function 



In order to compare a set of observed events to a given 
MBH population model, we must map the coalescence 
distribution predicted by the model to a theoretically ob- 
servable distribution which takes into account the "in- 
completeness" of the observations resulting from the lim- 
ited sensitivity of any given GW detector. This informa- 
tion can be encoded in a transfer function T[z,M,q), 
that depends only on the detector characteristics and on 
the gravitational waveform model. 

We model the detector and the gravitational waveform 
following Refs. [U l48t. The detector response is mod- 
elled following Cutler [49| : the three-arm LISA constella- 
tion is thought of as a superposition of a pair of linearly 
independent two-arm right-angle interferometers, and we 
can estimate the effect of "descoping options" or a failure 
on one satellite by assuming that only one of the two de- 
tectors is operational. The MBHB inspiral signal is mod- 
elled using the restricted post- Newtonian approximation, 
truncating the GW phasing at second post- Newtonian or- 
der - i.e., at order (y/c) , where v is the binary orbital 
velocity. We also limit our analysis to circular inspirals 
of nonspinning MBHs and neglect contributions to the 
observable signal that come from higher harmonics in 
the inspiral signal and from the (gravitationally loud) 
merger /ringdown phase. The latter assumption signif- 
icantly underestimates the energy carried in the GWs 
[50T |. the SNR of the signal 51 1 and the accuracy in es- 
timating the source parameters [l5j . From the point of 
view of studying MBH populations, it also means that 
we discard all information on the mass and spin distri- 
bution of MBHs formed as a result of each merger [lj| ■ 
In this sense, our assessment of the potential of LISA to 
constrain MBH formation models should be considered 
conservative. 



An important advantage of working in the frequency 
domain and of adopting a simplified waveform model is 
that we can sample the three-dimensional space (z, M, q) 
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by fast Monte Carlo simulations using an adaptation of 
the Fortran code described in Typically, we can 
estimate SNRs and parameter estimation errors of ~ 10 6 
binaries in one day on a single processor. This would 
not be possible with a more complex time-domain code 
including spin dynamics, such as that used in [32j . We 
consider a 21 x 21 x 21 three-dimensional grid spaced log- 
arithmically in the intervals q G [1CT 3 , 1], M 6 [10 3 , 10 8 ], 
and approximately linearly in z (namely, we consider 
z = 0.5 and then all values of z = 1, ...,20 in steps 
of Az = 1), for a total of 9261 points. At each point, we 
generate 1000 binaries assuming random position in the 
sky and orientation, random phase at coalescence, and 
coalescence time t c in the range [0, 3yr] (i.e., we consider 
only events that coalesce during the LISA mission). The 
GW signal is calculated in the Fourier domain in the sta- 
tionary phase approximation. Our statistical analysis, 
which will be discussed in section Hi D[ includes parame- 
ter measurement errors, which are modelled as described 
in Appendix [A] The modelling of errors due to instru- 
mental noise relies on the computation of the so-called 
Fisher information matrix f52| . For each system we com- 
pute the Fisher matrix and its inverse by means of the LU 
decomposition, as described in j47| ■ The accuracy of the 
inversion is usually worse for certain values of the intrin- 
sic parameters and of the angular position/orientation of 
the binary. Wc discard "bad" Fisher matrix inversions 
by monitoring a quantity e; nv , defined as 

e ilw =max|/" um -<%| , (2) 

i,3 



where /j" um is the "numerical" identity matrix obtained 
by multiplying the inverse matrix by the original, and 
Sij is the standard Kronecker delta symbol [47|. We set 
a maximum tolerance of e; nv = 10 ~ 3 to accept the inver- 
sion. For the accepted events, we compute the SNR p 
and then we define the transfer function as 

T(z, M, q) = — , (3) 

where N = 1000 is the number of successful matrix in- 
versions at any given grid point and N(p > pthr) is the 
number of binaries fulfilling the condition p > pthr, where 
Pthr is a pre-specificd SNR threshold. 

We consider four transfer functions Tj(z,M,q) (j = 
1,2,3,4) according to the following prescriptions: (1) 
one interferometer, pthr = 8; (2) one interferometer, 
pthr = 20; (3) two interferometers, p t hr = 8; (4) two inter- 
ferometers, pthr = 20. The chosen thresholds correspond 
(roughly) to the minimum SNR for which we expect to 
be able to claim a confident detection (pthr = 8) and 
the minimum SNR for which we expect to obtain a de- 
cent accuracy in estimating the parameters of the source 
(pthr = 20). Note that, by definition, < T(z, M, q) < 1. 

Examples of Ts(z, M, q) in the (M, q) plane at different 
redshifts are shown in Figure[TJ As expected, the transfer 
function is close to unity in the whole of the (M, q) plane 
at low redshifts, but a smaller number of events are ob- 
servable as we consider binaries coalescing at higher red- 
shifts. When we remember that high rcdshifted masses 
correspond to low observation frequencies, it is easy to 
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understand that the characteristic shape of the contour 
plots for large redshift (say, z = 20) reflects the shape 
of the LISA sensitivity curve (cf. Figure 1 of Rcf. [47f ) . 
More details of the calculation of SNRs and parameter es- 
timation errors in the three dimensional space [M , q , z) 
are given in Appendix [Bj 

The transfer functions arc coupled to the event distri- 
butions predicted by the models to obtain the theoreti- 
cally observable distributions Nt{z, M,q) for each model 
under the different assumptions on the transfer function; 
namely, 

d 3 N- 

where i labels the MBH population model being consid- 
ered and j labels the assumed detector specific^. These 
are the distributions which should be compared to simu- 
lated observed catalogs of MBHBs. 

For illustration, in Figure [5] we compare the marginal- 
ized distributions 

dNi f,f, d 3 N t 

——— = dz I da — — -rrr-r and 

dM J J dzdMdq 

dz J J dzdMdq v ' 

(thin lines) with the corresponding marginalized distri- 
butions 



N Tt 3 (M)= j dz j dqN Tt 3 (z,M,q) and 
N Tit3 (z)= j dM j dqN Tit3 (z,M,q) (6) 



(thick lines). Note that for some of the heavy seed models 
(namely, the short-dashed line corresponding to model 
BVR-noZ-MH-co) the two curves perfectly overlap: in 
these cases LISA observations do not miss events, i.e. 
they are "complete". 



C. Synthetic Monte Carlo catalogs 

To simulate LISA observations we perform 1000 Monte 
Carlo samplings of the d 3 Ni/ dzdMdq distribution pre- 
dicted by each model, producing 1000 catalogs of coalesc- 
ing binaries over a period of three years. In each catalog, 
the source position in the sky and the direction of the 
orbital angular momentum are assumed to be uniformly 
distributed. The phase at coalescence $ c and the coales- 
cence time t c are randomly chosen in the range [0, 2tt] 



1 Note that, in principle, the transfer function may depend on a 
third index fc, which labels the waveform model used for matched 
filtering. We do not consider this problem here, but the impact 
of waveform models on constraining the MBH population is an 
important topic for future study. 
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FIG. 2. Examples of the marginalized distributions dNi/dM 
(upper panel) and dNi /dz (lower panel) predicted by different 
MBH formation models. In each panel we plot the following 
models: VHM-noZ-Edd-co (solid red lines); BVR-noZ-Edd- 
ch (long-dashed green lines); BVR-noZ-MH-co (short-dashed 
blue lines). Thin lines represent the coalescence distributions 
predicted in three years, while thick lines represent the the- 
oretically observable distributions after the transfer function 
Ti{z,M,q) has been applied, namely Nt € 3 (M) and 3 (z) 
(see text for details). 



and [0, 3yrs], respectively. Each waveform is described 
by the set of parameters 

9 = (log A, log M, log??, t c ,$ c ,0 s , (7) 

where (i c ,$ c ) are the phase and time of coalescence, 
(6s, $s) represents the source location in ecliptic coordi- 
nates, (f?^, give the orientation of the orbital angular 
momentum of the binary and the GW amplitude of the 
signal 



A oc 



M 



5/6 



where 



D, 



i + z r dz^_ 

~WJo (1 + z') 2 Mm(1 + ^) 3 +"a] 1/2 



(8) 



(9) 



is the luminosity distance to the source. Our theoretical 
distributions are not functions of (M , 77, -Dl), but rather 
functions of (M, q, z), and the mapping between the two 
sets of parameters is given in Appendix [A] 

LISA measurements will yield a set of data {-Dfc}, 
k = 1,...,N, where N is drawn from a Poisson dis- 
tribution with mean N; coincident with the theoretical 
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number of events predicted by the model we consider 
(cf. Table [I]). Each element in the set is described by 
(f, a z ; M, um', q, cr q ), where z, M, q are the true param- 
eters of the system and a z , om 7 o~q are the diagonal ele- 
ments of the variance-covariance matrix describing the 
measurement errors. The latter are computed as de- 
scribed in Appendix [Al and include contributions from in- 
strumental noise, from uncertainties in cosmological pa- 
rameters and from weak lensing. We approximate the 
covariance matrix as diagonal since this is conservative, 
and the covariances are generally small. Strictly speaking 
we are not justified in ignoring the large covariance be- 
tween any two mass parameters (say, M and 77); however 
the errors on the mass parameters are always negligible 
when compared with errors on luminosity distance, cos- 
mological parameters and weak lensing (see Appendix 
lA"]) . The probability density function for the measured 
source parameters is then a multivariate Gaussian with 
these standard deviations, centred at the true source pa- 
rameters. As discussed in the next section, the errors can 
be folded into the analysis in two ways. The one we adopt 
is to construct the theoretically observable distribution, 
Afi t j(z, M,q) as described in Section Til B[ by spreading 
each source over multiple bins according to the Gaussian 
probability distribution for the measurement errors. We 
then construct data sets by assigning a unique set of ob- 
served parameters to each event that is equal to the true 
parameters, plus a random error drawn from the same 
probability distribution. For each "pure" MBH popula- 
tion model (label V) and LISA transfer function (la- 
bel "j"), we produce 1000 of these "observed" datasets 
{Dk}ij, to compare to the theoretically observable dis- 
tributions. Examples of Monte Carlo generated datasets 
are shown in Figure [3] 



1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




5 10 15 
redshift 



FIG. 3. Examples of Monte Carlo generated datasets. The 
left panels show the dNi/dM distributions, the central pan- 
els show the dNi/dq distributions and the right panels show 
the dNi/dz distributions. The upper panels refer to model 
BVR-Z-Edd-co, the lower panels to model VHM-Z-Edd-co. 
In each panel the dotted curves represent the theoretical dis- 
tributions, and the solid curves represent the theoretically ob- 
servable distribution filtered with the transfer function T3. 
The thick histograms show one Monte Carlo realization of 
the theoretical distribution, as observed by LISA, under the 
assumption of two operational interferometers and pthr = 8. 



D. Statistical analysis tools 



Throughout our study, we will assume T b s = 3 yrs as 
the fiducial LISA mission lifetime. However, it is inter- 
esting to study how the performance of LISA improves 
as a function of the duration of the data stream used 
in the analysis. This problem could be particularly rel- 
evant if, as expected, there are gaps in the LISA data 
stream. For this reason we will consider increasing ob- 
servation times, T bs, of 3 months, 6 months, 1 year, 18 
months, 2 years and 3 years, respectively. To construct 
these reduced datasets, we just pick events from the cat- 
alog that coalesce at t c < T b s , and then renormalize the 
theoretical distributions by a factor T Q b s /3yr. In doing 
this, we ignore sources that coalesce outside the reduced 
observation time, but which may have enough SNR to be 
detected in the shorter data segment. This is conserva- 
tive since we are effectively choosing only to include the 
coalescing sources in our analysis. However, for MBHBs, 
unlike the EMRI case (see Ref. [Hj]), almost all of the 
source SNR (and, consequently, the accuracy in the de- 
termination of M, q, and z) is accumulated in the last 
month of inspiral, and so there would not be a great deal 
to gain by including these sources in the analysis. 



In this work we will adopt a Baycsian approach to 
model selection and parameter estimation. This requires 
a parametric model for the distribution of events that 
LISA will observe. A particular astrophysical model of 
MBH formation cannot predict the actual number of 
events that will occur during the LISA mission, as the 
mergers will occur stochastically, but instead predicts the 
rate at which events with particular parameters occur. 
Assuming random start times, the number of events, rii, 
that will be seen in a particular bin, Bi, in parameter 
space will be drawn from a Poisson probability distribu- 
tion with parameter equal to the rate integrated over 
the bin: 



Vini) 



(10) 



If we divide the parameter space up into a certain num- 
ber of bins, K say, then the information that comes from 
LISA (the data D) is the number of events in each bin. 
The overall likelihood, p(D\\,M) of seeing this data un- 
der the model M with parameters A is the product of the 
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Poisson probabilities for each bin 



K 

p(D\X,M) = Y[ 

i=l 



-n(A) 



(11) 



The rates that enter this expression are the rates for ob- 
served events, i.e., the product of the intrinsic rate pre- 
dicted by the model with the LISA transfer function, as 
discussed in the previous section. It is straightforward to 
take the limit of this expression as the bin sizes tend to 
zero to derive a continuum version of this equation [53| . 

LISA will not be able to measure the parameters of 
each system perfectly due to instrumental noise. In ad- 
dition, weak lcnsing will introduce errors in the measure- 
ments of luminosity distance. Since we wish to use red- 
shift rather than distance as a parameter, further errors 
will be introduced from imperfect knowledge of the lumi- 
nosity distance-redshift relation. The modelling of these 
errors was mentioned earlier, and is described in detail 
in Appendix \E1 There are two ways in which the errors 
can be folded into the statistical analysis. Once LISA 
observations have been made, we will obtain posterior 
probability distributions for the source parameters which 
account for the error-induced uncertainties. The likeli- 
hood will then be computed by integrating the contin- 
uum version of Eq. (fTTj) over the posterior, as described 
in [HI]. The second approach, which we adopt here as 
it is more appropriate for a priori studies of this type, is 
to fold the expected errors into the computation of the 
observed rates, r^. In practice, we compute these rates 
directly from the Monte Carlo realizations described in 
the preceding section. For each source in the catalog 
we can assign fractional rates to every bin in parame- 
ter space, computed by integrating the error probabil- 
ity distribution for the source over that particular bin. 
In other words, we spread each source out into multiple 
bins, as predicted by the error model described earlier. 
When generating realizations of the LISA data set, we 
assign each source to one bin only, according to some "ob- 
served parameters" (which could represent, for instance, 
the maximum a posteriori parameters of the source). We 
take these observed parameters to be equal to the true 
parameters plus an error drawn from the same error dis- 
tribution. 

Given the likelihood described above, Bayes theorem 
allows us to assign a posterior probability, p(X\D, M), to 
the parameters, A, of a model, M, given the observed 
data, D, and a prior, 7r(A), for the parameters A: 



p(X\D,M) = 



p(D\X,M)ir(X) 



Z = / p(D\X, M) 7r(A)d Ar A 



(12) 



When comparing two models, A and B, that could each 
describe the data, we can compute the odds ratio (see, 
for example, [54|) 

Z A P(A) 



in which P{X) denotes the prior probability assigned to 
model X. If Oab > 1 {Oab < 1), model A (model B) 
provides a much better description of the data. 

In this paper, we will consider two types of model com- 
parison. In Section HVl we will consider mixed models in 
which the observed distribution is drawn from a superpo- 
sition of two or more of the underlying "pure" models. In 
those cases, the models depend on one or more free "mix- 
ing" parameters for which we will obtain posterior distri- 
butions using Eq. (fT2|) . First, however, in Section ILH1 we 
will make direct comparisons between the pure models. 
In that case, the models do not have any free parameters. 
The odds ratio, (|T3|) . then reduces to the product of the 
likelihood ratio with the prior ratio 



Oab 



p(D\A) P(A) 
p(D\B)P(B) 



(14) 



Oai 



Z b P{B) ' 



(13) 



The models we consider have all been tuned to match ex- 
isting constraints, and so at present we have no good rea- 
son to prefer one model over the others. Wc therefore as- 
sign equal prior probability to each pure model, P(A) = 
P{B) = 0.5, and the odds ratio becomes the likelihood 
ratio. We assign probability pa = p(D\A)/(p(D\A) + 
p(D\B)) to model A, and ps = 1 — Pa to model B. 

Once LISA data is available, each model comparison 
will yield this single number, pa, which is our confidence 
that model A is correct. Since the LISA data is not cur- 
rently available, we want to work out how likely it is 
that we will achieve a certain confidence with LISA. So, 
we generate 1000 realizations of the LISA data stream 
and look at the distribution of the likelihood ratio and 
confidence over these realizations. We can represent the 
results of this analysis in two alternative ways. These 
are illustrated in Figure 21 and we will refer to the two 
panels of this figure extensively in the following. The 
left panel shows a receiver operator characteristic (ROC) 
curve. This is a "frcquentist" way to represent the data. 
To generate this plot, we assume that we have specified 
a threshold on the statistic, in this case the likelihood ra- 
tio, before the data is collected. If the value of the statis- 
tic computed for the observed data exceeds the threshold 
then model A is chosen, otherwise model B is chosen. For 
a given threshold, the frequency with which the threshold 
is exceeded for realizations of model B defines the false 
alarm probability (FAP), while the frequency with which 
the threshold is exceeded for realizations of model A de- 
fines the detection rate. The ROC curve shows detection 
rate vertically versus FAP horizontally. In the figure, we 
indicate how, for an FAP of 10%, we can find the detec- 
tion rate, which in this case is 49%. This is the format we 
used to present our previous results in Ref. [33|. While 
the ROC is a convenient way to represent the data, it is 
incomplete, in that it does not tell us by how much we 
exceed the threshold: the result is far more convincing if 
we obtain a likelihood ratio 10 times the threshold, than 
1.1 times the threshold. 

The right panel of Figure U shows an alternative rep- 
resentation of the same data which contains this addi- 
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FIG. 4. Two alternative ways to represent our ability to distinguish models. The left panel shows an ROC curve, while 
the right panel shows the CDF of the confidence achieved over multiple realizations of the correct (upper curve) and wrong 
(lower curve) model. More details are given in the text. The model comparison used for this figure was VHM-noZ-Edd-co to 
VHM-noZ-Edd-ch, for a three month LISA observation and the most pessimistic assumption (T2) on the transfer function. 



tional information. It shows the cumulative distribution 
function (CDF) for the "confidence" we would have in 
model A, based on our observation, i.e., the probabil- 
ity, pa, we assign to model A in a Bayesian interpreta- 
tion of the results of an observation. The upper curve is 
the CDF computed over multiple realizations of model 
A (i.e., the horizontal axis then shows our confidence in 
the true model), while the lower curve shows the CDF 
computed from realizations of model B (i.e., the horizon- 
tal axis then shows our confidence in the wrong model). 
The best way to interpret this plot is to choose a certain 
confidence level, e.g., p = 0.95 (approximately 2a). The 
value on the upper curve is the frequency with which this 
confidence level, or better, would be achieved in a LISA 
observation when that model was correct, while the value 
on the lower curve at 1 — p is the frequency with which 
we would not be able to rule out model A with that con- 
fidence, when it was not true. 

The CDF plot encodes the same information as the 
ROC curve. If we assign a certain FAP, say 10% as be- 
fore, we draw a horizontal line at that value and find 
where it intersects the lower curve. This tells us the 
confidence level corresponding to that FAP, in this case 
0.67. The value on the upper curve at this confidence 
level is the detection rate at that FAP, and we find that 
it is 49%, as expected. In the current paper, we will use 
this second, Bayesian, representation of the results for all 
the remaining plots, as it encodes all of the information 
that can be gleaned from the Monte Carlo simulations. 
The Bayesian approach assigns relative probabilities to 
the models, rather than making a binary statement that 
model A is "right" or model B is "right" . 

The models we consider differ not only in the distri- 
bution of events that they predict, but also in the total 
number of events. As the latter could be considered a less 
robust prediction of the models, we can ask whether it 
carries much weight for model selection. This can be done 
by introducing a free parameter into each model, which 
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FIG. 5. Comparison of performance of model selection when 
including the total number of events as a parameter of the 
model (labeled "no marginalization" ) and when this parame- 
ter is marginalized over (labeled "with marginalization"). The 
small difference indicates that the total number of events con- 
tains relatively little information compared to the shape of the 
parameter distributions. We show the same model compari- 
son as in Figure [3] 

is an overall normalizing factor, and then marginalizing 
over it, i.e., integrating the posterior probability over this 
parameter. We write = Nfi, where fi is the rate in 
bin i for a model that predicts 1 event in total. The 
probability marginalized over N is 

p(D\M)= [H^j \J2n N °**e- n , (15) 

\i=l l " / n=l 

where A^bs = X) n i * s the total number of events ob- 
served. The summation in the second term is dependent 
only on A^ b s and, as such, is model- independent. It can 
thus be seen that 

p{D\M) oc p(D \M)c Nm N~ N ° b ° , (16) 
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FIG. 6. Results for comparisons of the pure models. Each plot shows all possible comparisons varying only one of the 
elements listed in Table [U Top left panel: we consider the effect of the accretion geometry, comparing coherent to chaotic for 
each of the combinations of the other ingredients. Top right panel: we consider the effect of the accretion model, comparing 
Eddington accretion to Merloni-Heinz accretion for the BVR-noZ models. Bottom left: we consider the effect of metallicity 
by comparing the noZ to Z models for VHM-co, VHM-ch, BVR-co and BVR-ch. Bottom right: we consider the effect of the 
seeding assumption, comparing the VHM to BVR models for the four combinations noZ-co, noZ-ch, Z-co and Z-ch, each with 
Eddington accretion. In all panels we are making the most pessimistic assumptions about the detector, i.e., we use the transfer 
function Ti (one interferometer, p t hr = 20). These results are for a 3 month LISA observation, except for the top left panel 
which is for a one year observation. 



where Nm is the number of events predicted by the un- 
normalized model M. We can decouple the contribution 
from the total number of events and the distribution of 
event parameters by replacing the likelihood p(D\M) by 
the marginalized likelihood p(D\M) in the likelihood ra- 
tio. In Figure [5] we show the effect this has on the CDFs 
for the Bayesian confidence, for the comparison between 
model VHM-noZ-Edd-co and VHM-noZ-Edd-ch. We see 
that including marginalization makes very little differ- 
ence to the results. This implies that the number of 
events predicted by a model contains little information 
relative to the parameter distribution. For the remain- 
ing plots in this paper, we do not marginalize over Nm, 
but we have checked that in all cases the effect of the 
marginalization is small. 



III. RESULTS FOR THE PURE MODELS 

We now describe the results of our analysis. In this sec- 
tion we will compare pairs of pure models using the tech- 
nique described in the previous section. We generated 
1000 realizations for each model, as described in Section 
III CI For each pair of models A and B, we computed 
the CDF of the Bayesian confidence of model A versus 
model B over the realizations of model A and those of 
model B. We present selected results in Figure [5] using 
the CDF curves described in the previous section. 

Each panel in Figure [6] shows the results for pairs of 
models that differed in only one of the four aspects of the 
input physics detailed in section III Al and listed in Table 
HI To be conservative, we consider a pessimistic scenario 
for the detector (transfer function T^. one interferometer, 
Pthr = 20). 

In the upper-left panel we show all possible (five) com- 
parisons among pairs of models differing only in the ac- 



12 



crction geometry (e.g., BVR-noZ-Edd-co vs BVR-noZ- 
Edd-ch), assuming a one year observation. Since we ig- 
nore the spin distributions, this is the property to which 
we are least sensitive, as clearly shown by the relatively 
small separation of some pairs of curves in the panel. In 
most cases the models are barely distinguishable at any 
reasonable confidence level. 

In the upper-right panel we compare models differing 
in their accretion model (Edd vs MH, two comparisons), 
assuming a three month observation. Our models are 
clearly more sensitive to this parameter, and they can be 
clearly discriminated with only three months of data. 

In the lower-left panel we investigate the impact of 
mctallicity (Z vs noZ, four comparisons). Pairs of mod- 
els are generally well separated even under pessimistic as- 
sumptions (a three month observation), and we can put 
forward an interesting astrophysical interpretation of the 
results. This panel shows that mctallicity "feedback" is 
better discriminated in high- mass seed models (BVR). 
This is because the effect of metallicity is to change the 
rcdshift distribution of the seeds. If seeds are massive, 
we can clearly detect this redshift difference by directly 
observing the first coalescing seeds in the Universe (recall 
that LISA observations are basically complete for mas- 
sive seed models, as shown in Figure [2]). Unfortunately, 
LISA is deaf to coalescences of a few hundred solar mass 
binaries at high z. Therefore, in low-mass seed models, 
we can only measure the redshift distribution of the seeds 
indirectly (by observing the distribution of mergers at a 
later cosmological epoch), and models are consequently 
harder to discriminate. 

Finally, in the lower-right panel, we look at the seed- 
ing process (left: VHMvs BVR, four comparisons). Here 
the result is very similar to the effect of metallicity. Pairs 
of models are typically well separated, especially if seeds 
form even at later times (the models). If seeds form at 
high redshift only, then the mass distribution of coales- 
cences at lower redshift tends to be more similar, as mass 
growth by accretion erases the differences in the initial 
seed masses. 
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FIG. 7. Effect of the transfer function on the pure model selec- 
tion results. We compare VHM-noZ-Edd-co and VHM-noZ- 
Edd-ch, assuming a fixed LISA mission duration of 3 months. 
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FIG. 8. Effect of the LISA observation duration on the pure 
model selection results. We compare VHM-noZ-Edd-co and 
VHM-noZ-Edd-ch, under the most pessimistic choice (T2) for 
the transfer function. 

We emphasize that the results discussed so far have 
made the most pessimistic assumptions about the detec- 
tor performance, i.e., three months of observation with 
a single interferometer and pthr = 20. Under such as- 
sumptions only a handful of sources will be detected, 
but this is already sufficient to discriminate among most 
of the models. In Figures [7] and [8] we consider a spe- 
cific model comparison (namely VHM-noZ-Edd-co versus 
VHM-noZ-Edd-ch) to display the effect of relaxing these 
assumptions. 

Figure [7] shows that the detector performance does not 
affect the results substantially. Lowering p t hr from 20 to 
8 for two operational interferometers only adds a few, low 
SNR, sources to the detected sample, and the gain in dis- 
crimination power is limited. On the other hand, Figure 
[8] shows that the observation duration is crucial. With 
an observation time of three months, we would achieve a 
2cr confidence level (pa = 0.95) with only ~ 10% proba- 
bility (i.e., if we repeated an independent 3 month LISA 
observation 10 times, we would expect one of these to 
reach 2a confidence). However, with an observation time 
of three years, the probability that we will achieve 2a 
confidence in the underlying model is more than 90% 
(upper dashcd-black curve). There is a similar trend in 
all model comparisons, although the three month result 
is particularly bad for this particular comparison, since 
these models differ only in the accretion geometry which 
we have seen is the most difficult aspect to distinguish. 
The trend with observation duration arises simply be- 
cause the number of detected sources increases linearly 
with the observation time, and so we have a much bet- 
ter sampling of the underlying model for longer mission 
durations. 

Comparisons between all possible pairs of models are 
given in Table HH where we assume a pessimistic detector 
performance and three months (left) or one year of ob- 
servation (right), respectively. Even though it is difficult 
to discriminate among some specific pair of models in 
the three month observation case, model discrimination 
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0.20 


0.05 


0.04 


X 


0.01 


0.93 


0.94 


8 


0.58 


0.51 0.16 


0.19 


0.07 


0.07 


0.98 


X 


0.94 


0.95 


9 


0.16 


0.12 0.23 


0.31 


0.03 


0.04 


0.17 


0.15 


X 


0.01 


10 


0.09 


0.07 0.15 


0.18 


0.02 


0.03 


0.14 


0.13 


0.95 


X 





1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


1 


X 


0.49 


0.99 


0.99 


1.00 


1.00 


0.91 


0.88 


1.00 


1.00 


2 


0.50 


X 


0.99 


0.99 


1.00 


1.00 


0.92 


0.93 


1.00 


1.00 


3 


0.00 


0.00 


X 


0.83 


0.97 


0.97 


1.00 


1.00 


1.00 


1.00 


4 


0.02 


0.02 


0.19 


X 


0.99 


0.99 


0.99 


0.99 


0.99 


0.99 


5 


0.00 


0.00 


0.03 


0.00 


X 


0.07 


1.00 


1.00 


1.00 


1.00 


6 


0.00 


0.00 


0.03 


0.00 


0.93 


X 


1.00 


1.00 


1.00 


1.00 


7 


0.07 


0.06 


0.00 


0.00 


0.00 


0.00 


X 


0.16 


1.00 


1.00 


8 


0.09 


0.05 


0.00 


0.00 


0.00 


0.00 


0.85 


X 


1.00 


1.00 


9 


0.00 


0.00 


0.00 


0.01 


0.00 


0.00 


0.00 


0.00 


X 


0.31 


10 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.61 


X 



TABLE II. Summary of all possible comparisons of the pure models. The table on the left (right) assumes a LISA observation 
time of three months (one year), respectively. Models are labeled by an integer i, as listed in Tabled] We take a fixed confidence 
level of p = 0.95. The numbers in the upper-right half of each table show the fraction of realizations in which the row model 
will be chosen at more than this confidence level when the row model is true (in the Bayesian figures, this would be the point 
where a vertical line at x = p intersects the upper curve). The numbers in the lower-left half of each table show the fraction 
of realizations in which the row model cannot be ruled out at that confidence level when the column model is true (in the 
Bayesian figures, this would be the point where a vertical line at x — 1 — p intersects the lower curve) . These results are for 
the pessimistic transfer function (T2). 



is almost perfect in most cases for a one year observation. 
The exception are the models differing in their accretion 
geometry only (bold numbers in the table), for which 
discrimination is difficult. However, even for such sim- 
ilar models we will obtain a high confidence level with 
probability close to unity if we assume a standard LISA 
configuration with two operational interferometers ob- 
serving for three years. 



IV. MIXED MODELS 

In the preceding section we (successfully) demon- 
strated the potential of LISA to discriminate among a 
discrete set of "pure" models given a priori. However, 
the true MBH population in the Universe will probably 
result from a mixing of the physical processes described in 
SectionHEl or even from a completely unexplored physi- 
cal mechanism. It is therefore important to test whether 
we will be able to extract useful information when the 
distribution of observed events comes from a mixture of 
the different models, as an approximation to possible un- 
knowns. For this case study, we will concentrate on the 
details of the seeding mechanism (mass function and red- 
shift distribution), deferring the more complicated details 
related to accretion to a future study. Recall in this con- 
text that accretion will leave a trace in the spin distri- 
bution of MBHs, but our simplified analysis neglects the 
MBH spins by construction. 

We tested two mixing procedures: (i) we generated 
artificially mixed models that were a linear combina- 
tion of the pure model distributions presented in Section 
III At (ii) we constructed two consistently mixed models, 



in which seeds were generated according to a mixing of 
two prescriptions, and their evolution was followed self- 
consistently in the halo merger tree realizations. The goal 
here is to assess whether artificial models can reproduce 
the salient features of the consistently mixed models, and 
to estimate the amount of mixing necessary to "best fit" 
the consistently mixed models. This procedure mimics 
the analysis of a "real" LISA datastrcam, for which the 
data is unlikely to match exactly any one of the pure 
model predictions. 

In this section we describe details of the artificially 
and consistently mixed models. In Section [V] that fol- 
lows we will present the results of the "reconstruction 
experiment" . 

Artificial mixing simply consists in drawing coales- 
cences from a linear combination of the theoretical coales- 
cence distributions predicted by the pure models. Here, 
for concreteness, we fix the accretion to be Eddington- 
limited and coherent, and we mix different seeding 
recipes. We therefore consider models VHM-noZ-Edd- 
co, VHM-Z-Edd-co, BVR-noZ-Edd-co, and BVR-Z-Edd- 
co (i = 1, 3, 5 and 7 respectively, in the notation of 
Table U. 

Each model i is characterized by a mean number of 
predicted coalescences Ni and a probability distribution 
for the parameters of the coalescing binaries Pi(M, q, z). 
The predicted event distribution Ni = d 3 Ni/dzdMdq 
(see Section Hi Ap can therefore be factorized as 

N i = N i xp i (M,q,z). (17) 

The mixing is described by four parameters fi (i = 
1,3,5,7) which determine the fraction of model i in- 
cluded in the mixed distribution. These fractions are 
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FIG. 9. Examples of mixed models. In the upper panels we 
show marginalized dNi/dM (left) and dNi/dz (right) distri- 
butions for the model TV-I (thick solid black lines), in which 
we mix models VHM-noZ-Edd-co (thin solid red line) and 
BVR-noZ-Edd-co (thin long-dashed green line). The relative 
contribution of the models is given by Eq. (|19|l with /i = 0.23, 
f 3 = 0.77 (cf. Tabled]). In the lower panel s we show the same 
distributions for the model TV-IV (thick solid black lines) in 
which we mix four "pure" models. The thin lines represent the 
relative contribution of the individual models VHM-noZ-Edd- 
co (solid red), VHM-Z-Edd-co (short-dashed blue), BVR-noZ- 
Edd-co (long-dashed green), and BVR-Z-Edd-co (dot-dashed 
magenta). The relative contribution of the models is given 
again by Eq. $TU$ with /i = 0.08, f 3 = 0.22, f 5 = 0.56 and 
f 7 = 0.14 (cf. Table HJ). 



constrained to add up to 1. 



A. Artificial mixing 

We tried two different mixing prescriptions. In the first 
case we ignored the number of coalescences predicted by 
each specific model, by mixing the respective Pi(M, q, z) 
distributions (p mixing) and normalizing the mixed dis- 
tribution to some arbitrary number: 



M p = TV m {/ipi + f 3 p 3 + Upb + hpr) ■ 



(18) 



where TV m was fixed to 200 coalescences in three years. 
In the second case we considered the number of pre- 
dicted events to be an intrinsic property of each indi- 
vidual model, and we simply mixed the Ni(M,q, z) dis- 
tributions (TV mixing) in the same way: 



A/V = /iM + / 3 A/- 3 + /5/V5 + hM 7 



(19) 



The total number of coalescences is now automatically 
determined by the values of the mixing parameters. In 
practice, in order to enforce the constraint that the frac- 
tions add up to 1, we actually use a "nested" prescription 
based on three parameters a, /? and 7, which are allowed 
to take any value in the range [0, 1]. We then set 

TvV = a/Vi + (1 - a){/3N 3 + (1 - p)frAf 5 + (1 - 7 )A/- 7 ]} . 

We quote our results in terms of the model fractions /j, 
as these are the physically relevant quantities. 

Table Hill lists eight mixed models that we investigated. 
Examples of TV-mixed model (model TV-I and TV-IV) are 
also shown in Figure [9] The theoretically observable dis- 
tributions are generated in the same way as for the pure 
models, by multiplying the TV" distributions by the appro- 
priate transfer function. Observed datasets Dk are then 
generated from the mixed distribution as outlined in Sec- 
tion UTC] Given a dataset Dk, the idea is to parametrize 
the distribution as a mixture of the available "pure" dis- 
tributions according to Eqs. (|T8| or (fl~9|) . and obtain a 
posterior distribution for the mixing parameters given 
the observed data. These posterior distributions allow us 
to assess which models were mixed, and at what mixing 
level. To make the test "realistic" , the theoretical mixing 
and the simulated LISA observations were performed by 
A. Sesana. The observed datasets Dk were then analyzed 
blindly by J. Gair, who did not know which models were 
mixed nor the amount of mixing. 



B. Consistent mixing 

We used the consistently mixed models (hybrid mod- 
els, henceforth labelled "HY") described in Ref. [Hj]. The 
seeding process was a mixture of the VHM-Z and B VR- 
Z mechanisms, and the MBH mass growth assumed Ed- 
dington limited, coherent accretion. We considered two 
models with fixed POPIII seeding efficiency, but different 
quasistar seeding efficiencies. The quasistar seeding effi- 
ciency is related to the maximum halo spin parameter, A, 
that allows efficient transfer of gas to the center to form 
a quasistar (see Ref. [24| for details). We test an ineffi- 
cient quasistar seeding model (A = 0.01, HY-I) and an 
efficient quasistar seeding model (A = 0.02, HY-IT) that 
predict MBH population observables (local mass func- 
tion, quasar luminosity function, and so on) bracketing 
the current range of allowed values. 

To check the effectiveness of our analysis tools in ex- 
tracting information about the parent MBH population, 
we try to recover the hybrid model distributions as a 
mixing of the VHM-Z and BVR-Z "pure" models, of the 
form given by either Eq. (jT5J) or (fTi?)) . The procedure is 
the same as detailed in the previous section. 

Let us stress again that the MBH evolution through 
cosmic history is followed self-consistently in the hybrid 
models. This means that the predicted theoretical dis- 
tribution is not, in general, described as a simple mixing 
of the form given by Eqs. (fl~8"|) or (flU)) . This is a crucial 
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NAME 


Mixing 


h 


h 


h 


h 


h fit 


h fit 


h fit 


fl fit 


fi + ft fit 


p-I 


P 


0.15 


- 


0.85 


- 


0.15 ±0.05 


- 


0.85 ±0.05 


- 


- 


p-II 


P 


0.54 


- 


0.46 


- 


0.55 ±0.1 


- 


0.45 ±0.1 


- 


- 


p-III 


P 


0.41 


0.13 


0.12 


0.34 


0.3 ±0.2 


0.25 ±0.25 


0.1 ±0.05 


0.35 ±0.1 


0.6 ±0.05 


p-IV 


P 


0.11 


0.49 


0.22 


0.18 


0.29 ±0.29 


0.3 ±0.3 


0.21 ±0.05 


0.2 ±0.05 


0.4 ±0.05 


AM 


Af 


0.23 




0.77 




0.2 ±0.1 




0.8 ±0.1 






Af-II 


Af 


0.61 




0.39 




0.6 ±0.15 




0.4 ±0.15 






Af-m 


Af 


0.31 


0.16 


0.23 


0.3 


0.4 ±0.2 


0.1 ±0.1 


0.2 ±0.05 


0.3 ±0.05 


0.5 ±0.05 


Af-IV 


Af 


0.08 


0.22 


0.56 


0.14 


0.15 ±0.15 


0.15 ±0.15 


0.5 ±0.1 


0.2 ±0.1 


0.3 ±0.05 



TABLE III. "Artificially mixed" models. Columns 3-6 list the mixing parameters used to generate the models. Columns 7-10 
list the best-fit values recovered by our analysis (see Section W\ . 



point: the success of this experiment will tell us that we 
can extract valuable information on complex MBH for- 
mation scenarios by mixing a set of "pure" models based 
on simple recipes. 



V. RESULTS FOR THE MIXED MODELS 

In the context of mixed models, we are no longer com- 
paring two pre-assigncd models A and B as descriptions 
of the observational data. We deal instead with a single, 
continuous parameter space of models, where the param- 
eters are the mixing fractions of some subset of "pure" 
models. For example, if we mix models 1 and 3, we have 
a one-dimensional parameter space given by the contri- 
bution of model 1 (/i) to the total population (the contri- 
bution of model 3 is fixed by the constraint f\ + fz = !)• 
Given a particular observation, we can then compute the 
posterior probability distribution function (PDF) given 
by Bayes theorem, Eq. (fT2"j) , for the mixing fractions. The 
computation of the posterior can be done either over a 
grid of points in the parameter space, or by exploring 
the parameter space by means of Markov Chain Monte 
Carlo simulations (which become much more practical as 
the dimension of the parameter space increases). 

For each mixed model, 100 different realizations of the 
LISA data were generated and a posterior probability dis- 
tribution for the mixing fractions was obtained for each 
one. The width of the posterior in a single realization 
reflects how well that particular data set can constrain 
the mixing fractions. The location of the peak of the 
posterior will change from realization to realization, but 
we would expect the width to remain approximately the 
same. We also expect that the distribution of the location 
of the peak of the posterior over many realizations should 
resemble the posterior for the mixing fractions computed 
in any single realization. 

We considered a total of eight different mixed mod- 
els, as listed in Table IIII1 mixing either just VHM-noZ- 
Edd-co and BVR-noZ-Edd-co or these two models plus 
VHM-Z-Edd-co and BVR-Z-Edd-co. For each case, we 
assumed that we were using three years of LISA data, 



but made pessimistic assumptions (T2) for the transfer 
function. While this latter assumption is slightly conser- 
vative, we checked that there was not much difference in 
performance when using the most optimistic assumptions 
(i.e., the transfer function T3). 

The posterior distributions of the mixing fraction 
found in one particular realization each of models Af-I 
and Af-II are shown in the left panel of Figure [TO] The 
PDFs peak around /1 = 0.2 ± 0.05 for model Af-I and 
fx = 0.55 ± 0.1 for model Af-II, which is consistent with 
the injected fractions listed in Table [TTTT 

As mentioned above, we expect the peak and width of 
the posterior PDF to fluctuate from realization to real- 
ization. To assess the statistical robustness of this result 
we therefore repeat the experiment. In the right panel 
of Figure [lOl we plot the distribution of the location of 
the peak of the posterior PDF found in each of 100 re- 
alizations of the models. As we expect, the widths of 
these distributions are very similar to the PDFs found 
in each of the individual realizations. The distribution 
peaks around f\ = 0.2 ± 0.05 for model Af-I and around 
fi = 0.55 ± 0.1 for model Af-II, in agreement with the 
true injected fractions listed in Table IIIII This experi- 
ment shows that most of the time we can correctly infer 
the relative contribution of the two models, but there is 
still the possibility to draw erroneous conclusions from 
a single observation. For example, in two realizations of 
model Af-II we would prefer an almost pure VHM model, 
while the underlying distribution is in fact 45% BVR. 
However, in these cases the posterior PDF is also very 
wide, which would be an indicator that the data set was 
not placing particularly good constraints on the model in 
that specific case. 

Figure [TT] shows the results for the more complex case 
of model A/-III, where all four of the pure models were 
mixed and the mixing parameter space is three dimen- 
sional. Again, both the posterior PDFs given by a spe- 
cific realization (left panel) and the distribution of the 
peak values of the posterior PDFs in a sample of 100 
realizations return mixing fractions which are consistent 
with the injected values (see Table [TTT]) . However, in this 
case the width of the VHM-noZ and VHM-Z posterior 
PDFs is large (~ 0.2), indicating a certain degree of de- 
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FIG. 10. Summary of the model mixing results for models Af-I and JV-II. In both panels, the horizontal axis shows the mixing 
fraction for model VHM-noZ-Edd-co. The left panel shows the posterior probability distribution function for this mixing fraction 
as found in one particular realization of each model. The right panel shows how the peak of the posterior was distributed over 
100 different realizations of each of the two models. The vertical lines show the true values of the mixing fraction. 




FIG. 11. Summary of model mixing results for model A/-III. We show the mixing fraction of the given model on the horizontal 
axis. The left panel shows the posterior probability distribution of the mixing fraction for each of the four models, VHM, BVR, 
VHM-Z and BVR-Z, found when analyzing a single realization of model M- III. The central panel shows the same thing, but 
now considering the fractions of BVR, BVR-Z and the sum VHM+ VHM-Z in the mixed model. The right panel shows the 
distribution of the peak of the posterior pdf found over 100 different realizations of the A/"-III model. 



generacy between those two models. If we consider the 
sum, then the posterior PDF is much narrower and peaks 
at the right value (/i + /s = 0.47, sec Table [TTTJ) , show- 
ing that there is much less degeneracy between the VHM 
and the BVR models. This is also nicely shown by the 
two dimensional posterior PDFs plotted in Figure [T2] 
All the ellipse contours have principal axes more or less 
directed along the x and y axes, with the exception of 
the VHM-noZ versus VHM-Z PDF, which shows a clear 
anticorrelation between those two fractions. 

Although we focused on the Af models, the same level 
of accuracy in the determination of the mixing fractions is 
achieved for p models. The results are collected in Table 
IIIII The results shown in the table refer to the most 
pessimistic transfer function; slightly better constraints 
on the mixing fractions can be obtained if we assume two 
operational interferometers and p t hr = 8. 

As a final step we present our results for the consis- 
tent mixing model. In the two hybrid models HY-I and 
HY-II, VHM-Z and BVR-Z seeding prescriptions are si- 



multaneously employed in a consistent way in the merger 
trees, and we do not expect the resulting binary popu- 
lation to be perfectly reproduced by any combination of 
our pure models. The test here is to combine the two 
"pure" VHM-Z (i = 3) and BVR-Z (i = 7) models to see 
if we can mimic the true distribution with some combi- 
nation of the two. We proceed exactly as for the artificial 
mixing, by recovering the posterior PDF of the mixing 
parameter. In this case we used only the p-type mixing 
model, and included maginalization over the total num- 
ber of events as given by Eq. (|16l) . The rationale for 
this was that we thought a consistent mixed model of 
this type would not necessarily have the same number 
of events as the underlying models, and so we wanted 
to eliminate bias that would be introduced by using the 
number-of-event information. We also computed results 
using the A/"- type mixing and/or not marginalizing over 
the number. These results were also reasonable, but the 
match between the intrinsic and recovered distributions 
was not as good. 
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FIG. 12. Two-dimensional marginalized posterior PDFs obtained from a single realization of model TV-Ill. Each plot shows the 
mixing fraction of one model component against the mixing fraction of a second component. The models are numbered from 
1, 3, 5 and 7, corresponding to VHM, VHM-Z, BVR and BVR-Z respectively, as in Table Q] The top row shows comparisons 
between model 1 (horizontally) and models 3, 5 and 7, respectively. The bottom row shows comparisons 3 to 5, 3 to 7 and 5 
to 7, respectively. Note that the individual components of VHM and VHM-Z are poorly constrained, which is why the plots 
involving VHM models have larger correlation contours than the BVR to BVR-Z comparison (bottom right). 



For p-type mixing with marginalization over number, 
we find that model HY-I and H Y- II a,re best reproduced 
by setting / 3 = 0.85 and fa = 0.45, respectively. The 
marginalized mass and redshift distributions of the best- 
fitting model are shown as red lines for model HY-I in 
Figure 1131 As expected, we can not perfectly match 
the true model distribution, but the overall agreement is 
good. Even though there is no "true answer" in this case, 
we can still extract useful information about the models. 
For example, we can confidently say that in model HY- 
II the contribution of the heavy seeding process is much 
higher than in model HY-I. This is consistent with the 
fact that model HY-II assumes a much more efficient 
quasistar formation prescription. 



VI. CONCLUSIONS 

In this paper we explored the "inverse problem" for 
GW observations, which is fundamental in assessing the 
possible astrophysical impact of GW astronomy. The 
question we addressed in this paper was: given a sample 
of observed MBHB coalescences ( with relative parameter 
estimation errors), what astrophysical information about 
the physical processes governing their formation and cos- 
mological evolution can we extract from the observations? 



More informally: are GW observations a valuable tool 
for astrophysics? We answered this question by applying 
the statistical framework of Bayesian model selection to 
simulated LISA observed datasets. We chose LISA as a 
case study, but the analysis could straightforwardly be 
generalized to any other GW detector. 

We considered ten different "pure" MBH formation 
and evolution models (see Table [J) differing in certain 
key aspects of the input physics, specifically: (i) the 
seed formation mechanism, (ii) the redshift distribution 
of the first seeds, (iii) the accretion efficiency during each 
merger and (iv) the geometry of accretion (see Section 
III A[) . For each model we computed the intrinsic coales- 
cence distributions d 3 Ni/dMdqdz. We then constructed 
the theoretically observable distributions by filtering the 
intrinsic distributions with four transfer functions Tj. 
These transfer functions account for different levels of 
completeness of the LISA observations according to four 
different sets of assumption about the performance of the 
LISA detector (Section III Bp . For each model we gen- 
erated 1000 observed datasets (including observational 
errors), and we analyzed them using a Bayesian model 
selection framework to assess their distinguishability as 
a function of the detector performance and of the dura- 
tion of the dataset used for the model comparison. We 
find that: 
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FIG. 13. Comparison of intrinsic and "recovered" distri- 
butions for the consistently mixed models. In each panel, 
the thick solid black curve shows the intrinsic distribution of 
mergers in the model, the red solid line shows the "recovered" 
distribution, which is a sum of the VHM-Z and BVR-Z mod- 
els, weighted by the best-fit mixing parameter. The thin lines 
show the contributions to this recovered distribution from the 
VHM-Z (solid blue) and BVR-Z (dashed magenta) models. 
The left panels show the distributions of the masses of merg- 
ers, while the right panels show the distributions of the red- 
shifts of mergers. The upper panels show the distribution 
for observed merger events, while the lower panels show the 
intrinsic distribution of mergers. 



• LISA will be able to discriminate among almost any 
pair of such "pure" models, even under pessimistic 
assumptions about the detector performance, after 
only one year of operation (see Table [II]). In partic- 
ular, it will be easy to identify the mass and redshift 
distribution of the seeds, and the efficiency of the 
accretion mechanism. 

• Models differing only by their accretion geometry 
are more difficult to discriminate. However, this 
was partly a consequence of our choice to consider 
measurements of only three parameters for each in- 
spiralling binary (mass, mass ratio and redshift), 
i.e., we ignored the information encoded by MBH 
spins and in the mcrgcr/ringdown. Including spins 
in the analysis will probably make such models 
easily distinguishable, as demonstrated in a simi- 
lar study by Plowman et al. [HI]. In any case, 
even without the extra information carried by the 
spins, we can discriminate between these models 
if we consider the optimal LISA performance and 
three years of observation. 



• The impact of the detector performance on the 
analysis is relatively mild. This is because lowering 
the threshold to p t hr = 8 and considering two in- 
terferometers only adds a small number of sources 
to the detected sample, and only slightly improves 
parameter estimation. 

• Not surprisingly, the length of the observation is 
important, as the expected number of MBHBs in 
the sample increases linearly with the observation 
time. To give a specific figure of merit, with a 
three-year observation window we have more than a 
90% probability that the parent model of an observed 
sample will be safely identified at a two-sigma con- 
fidence level (95%). 

To go beyond the pure model analysis, we considered 
the possibility of model mixing. First we created new, 
"artificial" models by mixing the coalescence distribution 
functions of different "pure" models (namely, models 1, 
3, 5 and 7, see TablcslIland lIII[) . We used pure models dif- 
fering in their seeding mechanism and in the redshift dis- 
tribution of the seeds (different metallicity "feedback"). 
The new models are characterized by the fractions fi of 
the "pure" models used in the mixing, with the constraint 
Y]. fi = 1. Then we considered two hybrid models, where 
halos were simultaneously seeded according to the VHM- 
Z (i = 3) and the BVR-Z (i = 7) prescription, and the 
evolution of the seeds was self-consistently followed in 
the halo hierarchy. We produced several LISA observed 
datasets for both the artificial and the hybrid models. We 
then tried to recover the combination of "pure" models 
that best reproduces each mixed model by maximizing 
over the posterior probability distribution function of the 
likelihood (Section EJ). We find that: 

• When the "pure" models used in the mixing differ 
only in their seeding prescription ( VHM vs B VR) , 
so that we have only a single mixing parameter 
(since fi = 1), we can correctly infer this mixing 
parameter with an accuracy of about 10%. 

• When we mix four different models we can still in- 
fer the mixing fractions with the same accuracy, but 
there is a certain degree of degeneracy between the 
two VHM models (i = 1 and i = 5); i.e., the effect 
of metallicity "feedback", as the detectable MBH 
population does not differ much between these two 
models. However, the fraction f\ + /s is very well 
constrained, and we can clearly distinguish the rel- 
ative contribution from the different seeding mech- 
anisms. 

• Finally, we can also get a fairly good match to the 
hybrid models by combining "pure" models. This 
is probably the most important result of our analy- 
sis. The formation and merger history of MBHs is a 
complex process, involving several physical ingredi- 
ents which arc poorly understood, and it is difficult 
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to imagine that we will have a comprehensive theo- 
retical understanding of the underlying physics be- 
fore LISA flics. However, we will certainly be able 
to construct a set of models based on simple physi- 
cal prescriptions that can be tested against the ob- 
servations. Our experiment with the hybrid models 
demonstrates that we can extract valuable informa- 
tion about more complex MBH formation scenarios 
by mixing a set of "pure" models based on simple 
recipes. 

The use of a Bayesian framework is crucial for the 
model mixing results, since it allows us to recover a pos- 
terior probability distribution for the "mixing parame- 
ters" that characterizes the fraction of each model con- 
tributing to the observed data set. In this respect, our 
analysis goes considerably beyond the work recently pre- 
sented in Ref. j32| , where only pure models were con- 
sidered and the statistical analysis was based on two di- 
mensional Kolmogorov-Smirnov tests performed on the 
distributions of pairs of measured parameters. 

Despite this improvement, the building blocks of the 
present work can be improved in many ways. The set of 
distinct "pure" models can not be representative of all 
the physical complexity of the problem. A more power- 
ful approach to MBH population modelling would be to 
describe the relevant physics using a set of continuous pa- 
rameters representing the critical features of the models 
(seed mass function, accretion efficiency and so on), and 
then attempt to measure those parameters by perform- 
ing a similar Bayesian analysis. We have also adopted 
several simplifying assumptions about the GW observa- 
tions, which can be refined by developing a more realistic 
model for the GW signal, including spins, higher harmon- 
ics, merger and ringdown. We can then attempt a more 
sophisticated analysis and explore the posterior proba- 
bility distribution function in a larger and more complex 
parameter space, to maximize the recovered information. 
All these issues should be explored in the future. 

Besides the scientific impact of a GW detection in and 
by itself, the ambitious goal of doing GW astronomy re- 
quires that we maximize the astrophysical information 
that will be extracted from such detections. In this re- 
spect, addressing the "inverse problem" in GW astron- 
omy is extremely important. In this paper we have made 
a first, small step in this direction. We hope that our 
work will encourage relativists and GW astronomers to 
consider in greater depth the astrophysical impact of GW 
detections. At the same time, we hope to convince "ordi- 
nary" astronomers that GWs can be an important tool, 
not only for tests of general relativity and as a laboratory 
for fundamental physics, but also in astrophysics. 
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Appendix A: Error modelling 

We describe here how measurement errors are included 
in the analysis. Errors arise due to instrumental noise in 
the LISA detector, and from the transformation between 
different coordinates. The error propagation expressions 
described below are probably not new, and the end re- 
sult is expected, but we include the derivation here for 
completeness and to clarify the underlying assumptions. 

LISA observations will determine the luminosity dis- 
tance to a given source, but we want to characterize the 
source by the redshift instead. The conversion can be 
done using the concordance cosmology at the time LISA 
flies, but this introduces additional errors, since the cos- 
mological parameters will be known imperfectly. Suppose 
we want parameters x which are given by the measured 
parameters, y, and a transformation x(y; A) that depends 
on some imperfectly known parameters, A. Suppose fur- 
ther that the probability distribution for A is L(X) and 
that for y is Y(y). The probability distribution for x is 
then 



X(x)= L(\)Y(y(x-,\))J(x-,\)d n >\, 



(Al) 



in which J(x; A) is the Jacobian for the transformation 
between y and x and n\ is the dimensionality of the A 
parameter space. We now make two simplifying assump- 
tions: (i) the distributions of the errors in y and A are 
multi-variate Gaussians with inverse variance-covariance 
matrices T v and F A respectively; (ii) the errors are small, 
so that the distributions are peaked near the true values 
of yo and Ao- This latter assumption means that we can 
use a linear approximation in the interesting region of 
the distributions 

-* Qy^ Qy% 

yi(x; A) » y l (x ; A ) + q^( x 3 ~ x oj) + g^i^i _ > 

(A2) 

where the derivatives are evaluated at yo, Ao- We can also 
ignore the Jacobian in the integrand of Eq. (|A1[) . since it 
will be approximately constant across the domain of inte- 
gration and therefore it plays the role of a normalization 
factor. Using the notation 

- _ x\ - \ \ _ ®y* r>A _ d v* 

Xi — Xi — XQi, OAi — Ai — Aoi, U^j — ~Qx~ ^ — dX ' 

(A3) 

we see that the integrand is proportional to the exponen- 
tial of 



1 {((Fx) T + SX T D X ) T y (D X 5L + (D X ) T 6X) 



<5A T r A <5A} , 



(A4) 
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where matrix notation is being used. This can be rear- 
ranged to give 



- \{{D x Z) T T y D x 5t - a T (r A + D x T y (D x ) T a 

+ (SX + af (r A + D x T y {D x ) T ) (SX + a)} (A5) 

where 



r + D T y (D ) J D F y D Xj k . 



(A6) 



The term on the second line of Eq. ()A5j) is just a Gaus- 
sian, whose centre has been shifted to a, and with 
variance-covariance matrix that is independent of x. 
When we integrate over the distribution of A, i.e., over 
6X, we find the probability distribution is proportional to 
the exponential of 



- ^ {(D x x) T T y D J x - a T (r A + D x T y (D x ) T a} , (A7) 



which is a multi-variate Gaussian with variance- 
covariance matrix T x equal to 

{D x f (v y - Y y {D x f (r A + D x Y y {D x ) T y 1 D x T y ^j D x . 

. (A8) 

Although this expression looks complicated, the inverse 
of this matrix takes the simple form 

(r 1 )- 1 = (d x ) -1 ((r*)- 1 + (D A ) T (r A )- 1 L» A ) {{d x ) t )-\ 

(A9) 

As it is this inverse matrix which determines the width 
of the distributions, we see that it takes the form we 
might expect, i.e., the error is the sum of the error con- 
tributions from the instrumental noise, Y y , and that from 
the uncertainty in the cosmological parameters, T A . The 
remaining terms just propagate the errors through the 
transformation in the standard way. 

In this paper, we estimate the errors in the observed 
parameters, T y , using the Fisher matrix formalism of 
Ref. [47j . These errors are given in terms of the chirp 
mass, M., the amplitude, A, and the symmetric mass ra- 
tio, 77, so we must transform these coordinates to mass, 
M. luminosity distance, Dl, and mass ratio, q. We con- 
vert luminosity distance to redshift by inverting the stan- 
dard cosmological relation of Eq. We assume that 
there are errors in Hq and il\ but enforce flatness (i.e., 
Om + ^A = 1). 

The diagonal components of the total error matrix in 



the new variables, (T x ) 1 , are then 

V /lnqlnq ^ 4.77 ^ 'In 77 In 77 



(nr. 1 



M In M 



(r *v 



(nrn 1 



ffi 



MlnM 
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(P 



r 1 

' In rj In 77 



-1 

In A4 In 77 ' 



Dl 



In M In M 



TO 1 



A In A 



3^ 



'lnA4 In A 



Affg 
Hi 



(A10) 



where (r 27 )- denotes the components of the inverse 
Fisher matrix as given in Ref. [47[, and AHq, Ail a are 
the errors in the cosmological parameters at the time 
of the LISA mission, which we take to be AHo/Hq = 
AJ7a/J7a = 0.01. The off-diagonal components in the to- 
tal error matrix come only from the rotation of the noise 
error matrix, (D x )~ 1 (T y )~ 1 ((D x ) T )~ 1 , but in practice 
we ignore these and draw errors based on the diagonal 
variance-covariance matrix with components as above. 
This is conservative, in that it approximates the error el- 
lipse by a bounding circle, but we have also checked that 
our results did not significantly change when they were 
recomputed using the full error model including correla- 
tions. 

We note that there is a singularity in the transforma- 
tion between 77 and q when q = 1 (7/ = 0.25, which is 
indicated by the divergence of (r*)^ 1 there). The proba- 
bility that two galaxies with exactly the same black hole 
mass merge is zero, and so this was not a problem in 
practice. 

Additional errors arise from the effects of weak lens- 
ing. This means that the apparent luminosity distance, 
cIa, of a source at the Earth is changed by a factor \i 
from the true luminosity distance, dx, so that d,A = /idr- 
If we assume that the weak-lensing (dc-)magnification is 
drawn from a Gaussian distribution (which is a reason- 
able approximation in the wcak-lcnsing limit, although 
not for stronger lensing), we can use the preceding ar- 
guments in this case as well. The parameters x are the 
parameters we assign to the source, which are the same 
as the measured parameters y. However, for a fixed value 
of /_t, the distribution of y is centred at a luminosity 
distance fidx- When we marginalize over possible val- 
ues of /j,, we end up with an integral of the form (|A1|) . 
but the dependence of the integrand on the unknown 
parameter is through the centre of the distribution, yo, 
rather than through the mapping to x. We can follow 
the same arguments as before, but replace the deriva- 
tives in equation (|A2|) by derivatives of y — yo- The end 
result then takes the same form. If the distribution of fx 
is a Gaussian exp(— r w ,(/j — l) 2 /2), we find the (r^)^ 1 
component must be corrected by addition of d^/T^^. In 
practice, we take the weak-lensing error estimate Az w \ 
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from Ref. (55j . and directly modify the zz component as 

(r- 1 ),, -> (r-!) zz + (Az wl ) 2 . 

While we formulated the above in terms of computing 
the distribution of errors in the parameters we measure 
from our observation, the same framework can be applied 
to the analysis of the real LISA data set. Once we obtain 
a posterior PDF for the measured waveform parameters, 
Y(y), we can combine this with a posterior on the cos- 
mological parameters and on the lensing magnification 
distribution through Eq. (|A1I) to derive the posterior on 
the inferred source parameters x. With the assumption 
that these measured posteriors are Gaussians, the final 
result will take the same form. 



Appendix B: Effect of LISA motion on SNR and 
estimation of intrinsic parameters 

In this paper we tried to provide conservative estimates 
of the astrophysical potential of LISA through observing 
MBH binaries. For this reason we only considered MBHB 
inspiral waveforms in the restricted post-Newtonian ap- 
proximation. The choice of simple, frequency-domain 
waveforms has the advantage that it significantly speeds 
up parameter estimation calculations: we can easily com- 
pute the SNR and parameter estimation accuracy of 
~ 10 6 binaries in one day on a single processor, some- 
thing that would be impossible if we used time-domain 
waveforms including spin precession. Computational re- 



quirements were indeed a limiting factor in the analysis 
by Plowman et al. [HJ , who used an advanced parameter 
estimation code developed by the Montana group [l3| . 

In Ref. J!?}, the potential of LISA to estimate bi- 
nary parameters was assessed using two independent for- 
malisms. In one case ("non angle-averaged") the mo- 
tion of LISA was taken into account using the formal- 
ism developed by Cutler (49|; in the other case ("angle- 
averaged") the authors averaged over the LISA beam- 
pattern functions. The angle-averaging procedure re- 
moves all information related to the Doppler shift due 
to the motion of LISA around the Sun, so in the angle- 
averaged formalism it is impossible to estimate the dis- 
tance and angular location of the source in the sky. How- 
ever, it is still possible to obtain an "angle-averaged" es- 
timate of the SNR and of the intrinsic parameters of the 
source (for our nonspinning binary model there are only 
two of them: M and q, or equivalently, A4 and 77). 

In summary, there are two ways of assessing the param- 
eter estimation capabilities of LISA. In the first method 
we angle-average over pattern functions, then we esti- 
mate the SNR and the accuracy in determining (say) M. 
and 77. In the second method we perform a Monte Carlo 
sampling of the pattern function (by assuming that the 
source location and angular orientation in the sky are 
isotropically distributed). In this way we can estimate 
the SNR of each source, as well as the accuracy in deter- 
mining (say) M., 77, the luminosity distance Dl and the 
angular position of the source Afi. 
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Averaged logJA M^M^J, z = 0.5 Averaged logJA M Mrp M M J, z = 5 Averaged logJA M^M^J, z = 20 




log 10 q log 10 q " log,~q 



FIG. 15. Contours of the error on the chirp mass in the (M , q) plane at different redshifts (from left to right: z = 0.5 , 5 and 
20). The top row refers to the averaged code, the bottom row to the Cutler code with T b B = 3 yrs and two interferometers. 



In this Appendix we address the question: when these 
two procedures can be compared at all (i.e., in the case 
of Ai, r\ and the SNR), how are they related? If the 
angle-average over pattern functions provides a sensible 
estimate of SNRs and of the intrinsic binary parameters, 
it could provide a significant saving in terms of compu- 
tational time for future studies of MBH populations. 

In Figure [14] we show contour plots of the SNR in the 
(M , q) plane at selected values of the redshift, for both 
the averaged and the non-averaged cases. This plot is en- 
couraging: it shows that the shape of these contour plots 
is essentially identical. Indeed, a more careful analysis 
reveals that the ratio between the averaged and non- 
averaged SNRs is SNRa/SNRna — 1-3 and it is (to a 
very good approximation) independent of (M , q , z). 

Fisher matrix calculations are inaccurate when the 
SNR is not much larger than unity (see e.g. [52j). Fig- 
ure [13] can be used to identify regions where the SNR 
becomes so small that the Fisher matrix approach will 
fail, and other parameter estimation techniques (such as 
Markov Chain Monte Carlo) will become necessary. For 
example, by looking at the contour line with SNR p = 8 
we see that Markov Chain Monte Carlo calculations will 
be necessary to estimate the parameter estimation errors 
for mergers of low-mass black holes at high redshift. In 
this context, recall that M = (l+z)M r , so (for large red- 
shifts) the total mass in the source frame M r is smaller 
than the mass M appearing on the y axis of the contour 



plots. 

Can we find more empirical relations between the 
pattern-averaged formalism and the Cutler approach? 
Figure [15] shows contour plots of the chirp mass deter- 
mination accuracy in the two formalisms. Once again, 
we see that there is indeed an approximate proportion- 
ality between mass estimation accuracies in the two ap- 
proaches. The ratios of the chirp mass errors show small 
random fluctuations consistent with the Poisson noise in 
the 10 3 Monte Carlo realizations at each point, but it is 
clear that the angle-averaged approach does a good job 
at predicting the SNR and the intrinsic parameter errors. 
This is true at any redshift. Indeed, we find that ratios 
in the errors on A4 and r\ are pretty much redshift in- 
dependent, and they show a very mild variation (in the 
range ~ 1 — 1.2) in the (M ,q) plane. If a similar pro- 
portionality applies also to estimates of the MBH spins, 
the pattern-averaging may turn out to be a very useful 
simplication for future MBH population studies. 

To finish, we will point out an interesting trend in 
the expected accuracy of angular resolution. We are not 
aware of systematic calculations of the angular resolution 
in the three dimensional (M ,q ,z) parameter space, so we 
present such a study in Figure [TBI Angular resolution de- 
grades with redshift, as expected. The plot shows that, 
for any given redshift, the angular resolution accuracy 
has a bimodal distribution - i.e., there are two islands of 
good angular resolution accuracy in the (M , q) plane. In 




hindsight, this is not too surprising: the "lower" island 
corresponds to low-mass binaries from which the GW 
emission is in the optimal sensitivity bucket of LISA; the 
"upper" island correspond to higher-mass binaries that 
merge at lower frequency, but have SNR large enough 



to compensate for the relatively small number of cycles 
spent in band. It will be interesting to verify if such a 
bimodal distribution persists when the merger/ringdown 
signal is also included in the analysis. 
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